Ideal Ratio Mask Estimation Using Deep Neural Networks for Monaural Speech Segregation in Noisy Reverberant Conditions
نویسندگان
چکیده
Monaural speech segregation is an important problem in robust speech processing and has been formulated as a supervised learning problem. In supervised learning methods, the ideal binary mask (IBM) is usually used as the target because of its simplicity and large speech intelligibility gains. Recently, the ideal ratio mask (IRM) has been found to improve the speech quality over the IBM. However, the IRM was originally defined in anechoic conditions and did not consider the effect of reverberation. In this paper, the IRM is extended to reverberant conditions where the direct sound and early reflections of target speech are regarded as the desired signal. Deep neural networks (DNNs) is employed to estimate the extended IRM in the noisy reverberant conditions. The estimated IRM is then applied to the noisy reverberant mixture for speech segregation. Experimental results show that the estimated IRM provides substantial improvements in speech intelligibility and speech quality over the unprocessed mixture signals under various noisy and reverberant conditions.
منابع مشابه
Binaural deep neural network classification for reverberant speech segregation
While human listening is robust in complex auditory scenes, current speech segregation algorithms do not perform well in noisy and reverberant environments. This paper addresses the robustness in binaural speech segregation by employing binary classification based on deep neural networks (DNNs). We systematically examine DNN generalization to untrained configurations. Evaluations and comparison...
متن کاملDeep Ensemble Learning for Monaural Speech Separation
Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN) based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences be...
متن کاملCombining monaural and binaural evidence for reverberant speech segregation
Most existing binaural approaches to speech segregation rely on spatial filtering. In environments with minimal reverberation and when sources are well separated in space, spatial filtering can achieve excellent results. However, in everyday environments performance degrades substantially. To address these limitations, we incorporate monaural analysis within a binaural segregation system. We us...
متن کاملSpeech Segregation based on Binary Classification
Speech segregation is a fundamental challenge in speech and audio processing. This AFOSR project aimed to develop a speech segregation system that can potentially improve speech intelligibility in noise for human listeners. Motivated by the perceptual principles of auditory scene analysis and the speech intelligibility studies of ideal time-frequency masking, the project sought to develop a cla...
متن کاملBinaural Reverberant Speech Separation Based on Deep Neural Networks
Supervised learning has exhibited great potential for speech separation in recent years. In this paper, we focus on separating target speech in reverberant conditions from binaural inputs using supervised learning. Specifically, deep neural network (DNN) is constructed to map from both spectral and spatial features to a training target. For spectral features extraction, we first convert binaura...
متن کامل